PStream: A Popularity-Aware Differentiated Distributed Stream Processing System
نویسندگان
چکیده
Real-world stream data with skewed distributions raises unique challenges to distributed processing systems. Existing workload partitioning schemes usually use a “one size fits all” design, which leverages either shuffle grouping or key strategy for the workloads among multiple units, leading notable problems of unsatisfied system throughput and latency. In this article, we show that based result in serious load imbalance low computation efficiency presence skewness while are not scalable terms memory space. We argue efficient scheduling is popularity data. propose PStream, popularity-aware differentiated assigns hot keys using rare ones grouping. PStream novel light-weighted probabilistic counting scheme identifying currently dynamic real-time streams. The extremely consumption, so predictor on it can be well integrated into instances system. further design an adaptive threshold configuration scheme, quickly adapt dynamical changes highly implement top Apache Storm conduct comprehensive experiments large-scale traces from real-world systems evaluate performance design. Results achieves 2.3× improvement reduces latency by 64 percent compared state-of-the-art designs.
منابع مشابه
Latency-aware Elastic Scaling for Distributed Data Stream Processing
Elastic scaling allows a data stream processing system to react to a dynamically changing query or event workload by automatically scaling in or out. Thereby, both unpredictable load peaks as well as underload situations can be handled. However, each scaling decision comes with a latency penalty due to the required operator movements. Therefore, in practice an elastic system might be able to im...
متن کاملSynergy: Sharing-Aware Component Composition for Distributed Stream Processing Systems
Many emerging on-line data analysis applications require applying continuous query operations such as correlation, aggregation, and filtering to data streams in real-time. Distributed stream processing systems allow in-network stream processing to achieve better scalability and quality-of-service (QoS) provision. In this paper we present Synergy, a distributed stream processing middleware that ...
متن کاملSession-Aware Popularity Resource Allocation for Assured Differentiated Services
Differentiated Service networks (DS) are fair in the way that different types of traffic can be associated to different network services, and so to different quality levels. However, fairness among flows sharing the same service may not be provided. Our goal is to study fairness between multirate multimedia sessions for an assured DS service, in a multicast network environment. To achieve this ...
متن کاملFault-tolerant stream processing using a distributed, replicated file system
We present SGuard, a new fault-tolerance technique for distributed stream processing engines (SPEs) running in clusters of commodity servers. SGuard is less disruptive to normal stream processing and leaves more resources available for normal stream processing than previous proposals. Like several previous schemes, SGuard is based on rollback recovery [18]: it checkpoints the state of stream pr...
متن کاملDistributed Reactive Stream Processing
Reactive programming paradigm successfully overcomes the limitations of observer pattern which has traditionally been used for developing event-driven distributed systems. Due to its declarative style, compositionality and automatic management of dependencies, reactive programming offers a promising new way for building complex distributed data-flow systems. This article outlines some open chal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Computers
سال: 2021
ISSN: ['1557-9956', '2326-3814', '0018-9340']
DOI: https://doi.org/10.1109/tc.2020.3019689